The genome sequence of a sawfly, Macrophya alboannulata (Costa, 1859)

We present a genome assembly from an individual female Macrophya alboannulata (sawfly; Arthropoda; Insecta; Hymenoptera; Tenthredinidae). The genome sequence is 245.2 megabases in span. Most of the assembly is scaffolded into 8 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 23.17 kilobases in length. Gene annotation of this assembly on Ensembl identified 24,359 protein coding genes.


Background
There are approaching 300 Macrophya species globally of which ten are present in Britain.The genus is characterised by elongate hind femora and coxae.Macrophya alboannulata (Costa, 1859) is within the subgenus Macrophya.In addition to the subgenera, several species groupings have been named and M. alboannulata is within the Macrophya epinota group together with Macrophya albicincta (Schrank, 1776) and Macrophya ribis (Schrank, 1781).Many Macrophya species are, as yet, unassigned to any subgenus.
Macrophya alboannulata was split from M. albicincta by Chevin (1975), and so its true historic status in Britain is not entirely clear.Some older M. albicincta records may have been uncritically, or erroneously, redetermined to M. alboannulata as the number of records is higher than might be expected.However, most recent records under the name M. alboannulata would seem to be named correctly (Musgrove, 2023).Liston (1983) noted that M. alboannulata seemed to be found mostly in southern Britain, but there are two recent Scottish records.
Macrophya alboannulata is a relatively large (10-11 mm) Macrophya species.The insect is black, marked with white to varying degrees on the clypeus, labrum and postocellar region, the pronotum, legs and the margins of some abdominal tergites.Whilst similar to M. albicincta and M. ribis, M. alboannulata can be recognised in the field with careful examination.Macrophya ribis tends to be smaller and has dense course punctures around the frons and temples.The two remaining species can be separated by the colouring of the hind trochanter and trochantellus as described by Chevin (1975).In M. albicincta the hind trochanters are predominantly black and the trochantelli usually white with a black spot, whereas in M. alboannulata the hind trochanters are predominantly marked with white and the trochantelli never have a black spot.In males the labrum of M. albicincta is at least darkened at the lateral edges to mostly black, whereas in M. alboannulata the labrum is at most slightly infuscate at the lateral margins.
Most sawfly species are predatory, or feed on pollen or nectar, but the feeding habits of adult M. alboannulata are unclear.The larvae feed on elder (Sambucus nigra L.), and are not considered a pest of agricultural or horticultural significance in Britain.The species is univoltine with adults on the wing from April to July.This female M. alboannulata from Wytham Woods, England was identified using Benson's key (Benson, 1952) with reference to Chevin's identification characteristics.
Whilst some Macrophya species exhibit BIN sharing, M. alboannulata appears to be a well-defined species with barcoded specimens falling in a single cluster AAK6380 (BOLD Systems, 2023).Knowledge of sawfly evolution will benefit from the comparative analysis of genomes from closely and distantly related species.This complete gene sequence will help our understanding of the phylogeny of this group.

Genome sequence report
The genome was sequenced from one female Macrophya alboannulata (Figure 1) collected from Wytham woods, Oxfordshire, UK (51.76,.A total of 101-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 101 missing joins or mis-joins and removed 7 haplotypic duplications, reducing the assembly length by 1.16% and the scaffold number by 54.63%, and increasing the scaffold N50 by 110.89%. The final assembly has a total length of 245.2 Mb in 48 sequence scaffolds with a scaffold N50 of 37.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.38%) of the assembly sequence was assigned to 8 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
The resulting annotation includes 24,653 transcribed mRNAs from 24,359 protein-coding genes.Tissue from the thorax was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted in the WSI Scientific Operations core using the

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.DNA sequencing was performed by the Scientific Operations core at the WSI on a Pacific Biosciences SEQUEL II instrument.Hi-C data were also generated from head tissue of iyMacAlbo1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).
Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome annotation
The BRAKER2 pipeline (Brůna et al., 2021) was used in the default protein mode to generate annotation for the Macrophya alboannulata assembly (GCA_949628255.1) in Ensembl Rapid Release.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that

Software tool Version
All of the methods are clearly explained within the text as well as within the methods section.This is a very good report that will be useful to anyone performing comparative genomics within this group of animals and for anyone wishing to research this sawfly specifically.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes

Romain Gastineau
University of Szczecin, Szczecin, Poland The article 'The genome sequence of a sawfly, Macrophya alboannulata (Costa, 1859)' describes the obtention of a rather complete nuclear genome for a female specimen of species of Hymenoptera sampled in UK.
The protocol is clearly explained and the depth of sequencing seem to have been correct enough as it allowed the authors to retrieve a ca.250Mb genome in only 48 contigs.
Authors are obvious native English speakers so the language quality of the article reflects this.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
A female Macrophya alboannulata (specimen ID Ox001511, ToLID iyMacAlbo1) was netted in Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.76, longitude -1.33) on 2021-05-31.The specimen was collected and identified by Steven Falk (independent researcher) and then snap-frozen on dry ice.Protocols developed by the Wellcome Sanger Institute (WSI) Tree of Life Core Laboratory have been deposited on protocols.io(Dentonet al., 2023b).The workflow for high molecular weight (HMW) DNA extraction at the WSI includes a sequence of procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.The iyMacAlbo1 sample was weighed and dissected on dry ice(Jay et al., 2023), with tissue set aside for Hi-C sequencing.

Figure 2 .
Figure 2. Genome assembly of Macrophya alboannulata, iyMacAlbo1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 245,239,222 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (51,530,056 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths(37,453,143 and 19,476,270 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the hymenoptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Macrophya%20alboannulata/dataset/iyMacAlbo1_1/snail.

Figure 5 .
Figure 5. Genome assembly of Macrophya alboannulata, iyMacAlbo1.1:Hi-C contact map of the iyMacAlbo1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=UQaGT2x2RDq3FRlYtdORtg.